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ABSTRACT 


Various rumors and assumptions have circulated about the COVID- 
19 immunization, making it a heated subject of discussion in India. 
This prompted a reaction from the country's populace, who During 
the course of favorable, negative, and neutral evaluations, tweets and 
retweets on twitter. The number of these tweets are a jumble of 
unstructured data. The goal of this study is to have the statistics 
justify feeling implied by it. The purpose of this study is to take 
advantage of twitter's massive data pool and extract insights that have 
the implications that can be drawn from it. Comprehensive research 
on the people's feelings may help us arrive at a fair familiarity with 
the population at large's point of view toward preventing disease by 
vaccination. Dataset taken into consideration for vaccination-related 
tweets are collected for study. From 2020 to 2021, including a data 


How to cite this paper: Ms. Tanzeela 
Qureshi | Dr. Mohit Singh Tomar | Dr. 
Ritu Shrivastava "An Assessment of 
Sentiment Analysis of Covid-19 
Tweets" Published »>——____, 
in International 
Journal of Trend in 
Scientific Research 
and Development 
(ajtsrd), ISSN: | 
2456-6470, 

Volume-7 | Issue-5, 
October 2023, pp.534-543, URL: 
www.ijtsrd.com/papers/ijtsrd59976.pdf 


IJTSRD59976 


mining of 16,05,152 tweets related to vaccination. 
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I. INTRODUCTION 

There was global anarchy as a result of the COVID- 
19 epidemic, which ravaged every country on Earth. 
All hope hinged on the vaccine because of the 
mutative and aggressive character of the virus. Pfizer, 
Moardina, Covi Shield, and many other international 
corporations worked hard to develop an effective 
vaccine. In any case, the notion that adverse effects 
for vaccinations are unavoidable was not effectively 
absorbed by the general population, despite there 
being a clear majority of approval. Many people's 
opinions are influenced by what they read or hear in 
the mainstream and social media. Consequently, 
social media played a crucial role in communication 
and expression of thoughts about vaccines, with 
Twitter in particular playing a pivotal role due to its 
unique features that allow users to tweet (i.e., express 
an opinion), retweet (i.e., support an opinion), and 
extend comments and like to a wider audience. 


With over 500 million tweets sent every day, Twitter 
is a treasure trove of information that may be mined 
for insights if used correctly. Many academic 
investigations have used Twitter data. Twitter was 
used as a platform for individuals in India to openly 
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discuss the topic of vaccination via tweets, retweets, 
etc. Many insightful conclusions may be derived by 
analysing people's moods according on the content of 
their tweets on Twitter utilizing sentiment analysis 
technologies. Opinion mining for sentiment analysis 
is a data-analysis method that may establish whether 
the data is good, negative, or neutral. 


Therefore, the purpose of this research is to provide 
substantial insights by analysing the mood of all 
tweets on vaccines. The goal of this study is to use 
Sentiment Analysis to do an exploratory data analysis 
of all tweets and Twitter data. The results of this 
study will provide light on how the general public 
feels about COVID-19 vaccinations. 


The study is organized as follows, with Section 2 
focusing on prior studies that are pertinent to the topic 
at hand. Sentiment analysis is defined and briefly 
discussed in Section 3. The dataset that was utilized 
for this analysis is described in great depth in Section 
4. In Section 5, we detail all the findings from our 
exploratory data analysis of the dataset. Experiment 
findings, key insights, and future plans for this model 
are presented in Sections 6 and 7, respectively. 
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Il. LITERATURE WORK 

By using the capabilities of Natural language 
processing (NLP) to analyze the sentiment that is 
being expressed in the specific data, the notion of 
opinion mining or sentiment analysis has been 
employed and modified for diverse studies throughout 
time. Previous studies that have shed light on this 
topic are discussed below. 


In the paper[1], the BERT model is used to do 
Sentiment Analysis on Twitter data. Tweets were 
geotagged in order to classify the data utilized in this 
work. The BERT model for emotion categorization 
was used to train the data, and the SVM classifier was 
used to assess the model's effectiveness. On the 
whole, the acquired data was accurate to within 4%. 
Paper [2] presents an analytical framework for impact 
of COVID-19 on the stock market based on tweets 
during the outbreak. Supervised learning was used to 
train this model, which achieved an accuracy of 86.24 
percent. The studies were conducted after the 
Coronavirus epidemic to aid businesses in forecasting 
stock prices, identifying new marketing opportunities, 
and monitoring their own growth. In paper [3], we 
analyze what people were tweeting about most during 
and after the first outbreak of the COVID-19 
pandemic. For topic extraction, we used Latent 
Dirichlet Allocation (LDA), and for sentiment 
analysis, we relied on a Lexicon-based strategy. This 
report does a good job of summing up the concerns of 
different groups during the early stages of the 
epidemic. Using a dataset of 600,000 English- 
language tweets, the model was trained using 80% of 
the data and then tested using 20% of the data. The 
article used sentiment analysis to illustrate people's 
thoughts on the most discussed issues. 


This paper [4] examines tweets from across all of 
India's states during the months of November 2019 
and May 2022. In this article, we successfully used 
sentiment analysis to the gathered information and 
found that, on the whole, Indians had an optimistic 
outlook on life. 


There was a correlation between the number of 
confirmed cases of COVID19 in a given state and the 
number of tweets sent from that state. The research 
[5] provides a comprehensive analysis of the tone of 
all tweets related to COVID-19. In this case, we 
evaluated the tone of the tweets using Logistic 
Regression, VADER sentiment analysis, and BERT 


sentiment analysis. In order to analyze public opinion 
on the issue of Coronavirus, the authors of paper [6] 
combine data from two sources: the textual tweets 
posted in April 2020 from six nations and the tweets 
of top 10 politicians. In the end, the report presents 
findings that shed light on the similarities and 
variances in public opinion among nations. The 
results showed that across all six nations, respondents 
felt the most "trust," "fear," and "anticipation." 
Sentiment analysis utilizing word weighting TF-IDF 
and Logistic Regression was performed on the 
Twitter data from 30th April 2020 in article [7]. This 
algorithm successfully classified the sentiment of the 
tweets with an accuracy of 94.71%. 


Our understanding of the prior studies in this area was 
much enhanced by this literature study. Our project's 
trajectory is now clearer thanks to this. 


IW. SENTIMENT ANALYSIS 

An application of Natural Language Processing 
(NLP), sentiment analysis classifies data and texts to 
reveal how people feel about a topic [8]. This helps in 
understanding the author's intentions and point of 
view. This technique uses a scoring system that 
shows the true meaning and viewpoint of the text. We 
can more quickly identify positive, bad, and neutral 
aspects of the material by using these evaluations. 
Businesses regularly use opinion mining (or 
"Emotion AI") and sentiment analysis (or "sentiment 
analysis") to get insight into how customers and the 
wider public feel about a brand or product. 


To gauge public opinion about COVID vaccinations, 
we use Sentiment Analysis to data gathered from 
Twitter after the second wave of Coronavirus. The 
study's findings may provide light on the public's 
thoughts and feelings towards COVID-19 vaccines. 


There are two main phases to any sentiment analysis: 

1. Prioritizing, sanitizing, and selecting features 
from datasets 

2. Applying Sentiment Analysis to the Data 


IV. DATASET DESCRIPTION 

The project began off with information collection and 
classification. For this study, we analyzed data from 
the 'Covid-19 All Vaccine Tweets’ collection. The 
data, which covers the period from December 2020 to 
August 2021 and consists of 80,418 tweets, was 
acquired from kaggle.com [9]. Table 1 lists the 
characteristics and provides explanations for each. 
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Table 1: Attributes of the dataset and their description 


ATTRIBUTES DESCRIPTION 

id This gives the id of the tweet 

user_name User name of the person who has tweeted 

user_location _ |The location of the person who has sent the tweet 

user_description | The Twitter bio of the person writing the tweet 

user_created When the Twitter account of the user was created 

user_followers _|Number of followers of the person sending the tweet 

user_friends Number of friends of the person sending the tweet 

user_verified Binary value specifying whether the user is verified on Twitter or not 
date Date and time when the tweet was sent 

text The text in the tweet as it is 

hashtags Specifies all the hashtags that were used in the tweet 

source Gives information about the source(device or application) from which the tweet was sent 
retweets Number of times the tweet was retweeted 

favourites Number of people who have marked the tweet as a ‘favourite’ 
is_retweet Tells us if the tweet is a retweet or a new one 


Following this, the tweets in the dataset were cleaned up by removing things like mentions, hashtags, retweet 
information, and links. Time stamps for tweets were also eliminated since they were deemed unnecessary. Some 
of the most salient characteristics from the aforementioned list are chosen for exploratory research. 


V. METHODOLOGY 
This section explains in depth how Sentiment Analysis was carried out on the selected dataset. 


Gathering information that may be used in the analysis was the first stage. The same was discussed at length in 
the preceding paragraph. The dataset contains clean, pre-processed data. Eighty three hundred and six records 
survived after duplicate columns were removed from the dataset. The tweets were then cleaned up by removing 
any traces of mentions, hashtags, retweets, links, etc. We also scrubbed the data for tweet timestamps. Then, a 
subset of the aforementioned traits was chosen since it was more relevant to the data analysis being conducted. A 
few key graphs were displayed after a graphical study of the data was performed. 


The number of tweets sent from each device type is displayed in Fig. 1. The majority of tweets were sent from 
Android devices, followed by the Twitter Web App, and then the Cowin Vaccination Availability platform, as 
seen in the provided scatter plot. 


30000 
25000 
20000 
15000 
10000 
5000 

fo 

Twitter Web Twitter for Twitter for TweetDeck VaxBir 

App Android iPhone 


Figure 1: Plot showing the source of the tweets posted 


Figure 2 displays the distribution of tweets between verified and unverified accounts. The narrative reveals that 
about 10% of tweets came from verified accounts, while the remaining 90% came from unverified ones. 
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Figure 2: Plot showing the number of tweets sent from verified or unverified accounts 


The most popular tweets about COVID-19 vaccinations were identified by taking the top 10 most retweeted 
tweets from the dataset. They look like Fig. 3 down below. 
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Figure 3: Top 10 most retweeted tweets related to COVID-19 vaccine. 
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The top 20 accounts based on the frequency of the tweets were found out. They are as shown in Fig. 4 below. 
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Figure 4: Top 20 accounts based on the frequency of the tweets 


The information was divided into three groups, one for each polarity value. Tweets with polarity values between 
-1 and -0.01 were classified as negative. There are three types of tweets: positive (1), negative (-1) and neutral 
(0). Tweets with polarity values between -0.01 and 0.01 were labeled as "Neutral," while those with polarity 
values between 0.01 and | were labeled as "Positive." The number of tweets that fall into each of these three 
categories is shown in Fig. 5. 
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Figure 5:Tweet count given as positive, negative or neutral class 
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Figure 6 shows the CDF of tweet sentiments and the distribution of tweet sentiments throughout the sample. 
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Figure 6: Distribution and CDF of Sentiments across tweets in the dataset 


Next, we identified the most frequently used terms in both the most positive and negative tweets in the total 
dataset and created word clouds for them. Fig. 8 below illustrates this point. 


The chart below shows the public's overwhelmingly favorable reaction to the COVID-19 vaccinations, with the 
most prevalent phrases being "good," "thank," "effective," "vaccinated," "great," "happy," "safe," etc. 
Emergency, forced, alone, Canada, halt, second, death, India, Ontario, etc. were other frequent terms in the 
unfavorable tweets. 


Common Words Among Most Positive Tweets Common Words Among Most Negative Tweets 


Figure 7: Word Clouds for the common words among the most positive and most negative tweets 


To go further, we plotted word clouds from tweets about a select number of nations and places. Figures 8 and 9 
and Figure 10 depict them below. 
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Figure 8: Most common words in tweets related to India 
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a 9: Most common words in tweets related to USA 


4H  —- Firstheprened Piet ago 
w 7 ; sO 
fal + KV Vy o ‘ 
production $41*53Kvq Geng Ommn yes 
_ nD 
a ati =" . phases SU 
Sayslss : < = India 
+4 = Read .... 
; Eff, ie ive ‘htt tp es Rever seS9Ce 
nivttps Vaccine PfizerBioNTech 
S : done ents COViShield 
»1ster ICMR happy oubl she: ’ Vaccination 


Figure 10: Most common words in tweets related to Mumbai 


Due to the limited vaccination options in India, we created a word cloud from tweets about Covaxin and 
Covishield. Fig. 11 displays this. 


Figure 11: Word cloud for the tweets Covishield and Covaxin 
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The data was then processed using some sophisticated methods to generate the color-coded word clouds for the 
divided tweets. The information was once again scrubbed. The tweets were filtered to remove any grammatical 
or spelling errors or nonsense. After that, the whole Twitter data was classified into positive, negative, and 
neutral categories, and corresponding cloud words were formed for each. That's what Fig. 12 shows. 
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“gener 12: Colour-coded word eons far all the tweets 


Similarly, the colour-coded word clouds for the Covishield and Covaxin were also plotted which is shown in 
Fig. 13 below. 
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Figure 13: Colour-coded word elds for the Covishield and Covaxin 


A. Polarity 

A word's polarity is the degree to which it expresses a negative, neutral, or positive emotion. Words with a 
positive polarity have a value of 1, whereas neutral words have no polarity and negative words have a value of - 
1. The polarity of a tweet is calculated by taking the mean of all the words in it, which is a float value between -1 
and +1. Polarity of a tweet is a matrix that breaks down a tweet's emotional tone into positive, negative, and 
neutral categories. 


B. Subjectivity 
The ratio of subjective to objective details in a tweet or paragraph depends on the 


speaker. The degree to which a writing is subjective rises as more private details are included and falls as more 
objective data is presented. It's a proxy for the author's degree of involvement in the tweet or other source 
content. 


A small subset of the dataset's generalizations about vaccination was then chosen for testing of polarity and 
subjectivity in terms of sentiment. In Figures 4 and 5, we see the polarity and subjectivity ratings. 
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Figure 14: Polarity score of the tweets 
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Figure 15: Subjectivity score of the tweets 


The following is a flowchart depicting the research procedure that was used. 


Data Collection 


Preprocessing & 
Cleaning of Twitter 
Data 


Sentiment Analysis 


Data Analysis & 
Results 


Figure 16: A flowchart of the methodology followed for implementation of this project 


VI. RESULTS AND DISCUSSIONS 

In this study, 16,05,152 tweets and retweets were 
analysed for attitude about immunization in India. 
After being cleaned, pre-processed, and having 
duplicates removed, the data utilized in the research 
was ready for analysis. The tweets had _ their 
timestamps, mentions, hashtags, retweets, and links 
deleted. Word clouds of varying permutations were 
extracted while also plotting crucial graphs like 
emotion labels graphs and distribution graphs, among 
others, to get the gist of the data. Additionally, a few 
general comments about vaccination were chosen and 
examined for polarity and subjectivity to determine 
how they were received. Data analysis shows that the 
majority of Indians have a favourable opinion about 
vaccination, but that there is still a strong negative 
attitude towards the practice in the country. 


VII. CONCLUSION AND FUTURE SCOPE 

The public has been affected in a variety of ways by 
the coronavirus, and this sort of study may aid 
government and other research organizations in 
comprehending public sentiment and filling in 
knowledge gaps. Twitter data analysis is crucial since 
it is a place where many individuals express their 


honest, sometimes controversial, opinions. With the 
help of Natural Language Processing (NLP) methods 
like subjectivity and polarity, the study analyses 
millions of public views expressed via tweets on the 
Twitter network and provides us with the necessary 
analysis as outputs in the form of graphs and tables. 
This study's findings highlight the need for increased 
vaccination awareness and provide new insight into 
the factors that make some individuals feel uneasy 
about being vaccinated. 


Since it is crucial to grasp the public's mood in a 
variety of scenarios, this research has significant 
future potential. Sentiment analysis may be conducted 
again to examine public opinion on the third wave, 
public opinion on immunization delay in India, and 
similar subjects. There is a wealth of relevant data at 
our disposal, which we may decipher and analyse for 
use as a springboard for future action. Any piece of 
private information may be utilized as a dataset to 
help analyse the tone of a tweet or other piece of data. 
In today's lightning-fast, data-powered world, having 
a matrix to better comprehend the reasoning behind 
opinions on how to use data is critical. 
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